Regional Information Capacity of the Linear 

Time- Varying Channel 

Brendan Farrell, Student Member, IEEE, Thomas Strohmer 



OO 

O 

o 

(N 
X> 

IX, 

o\ 

(N 



> 

o 

m 
m 

(N 
O 
OO 

O 

> 

X 



Abstract — We determine the information capacity of the lin- 
ear, time-varying communications channel with additive white 
Gaussian noise for transmission signals with support approxi- 
mately restricted to closed regions of the time and frequency 
domains. We address the two-part problem of first, constructing 
appropriate transmission functions, and second, determining the 
mutual information. Our approach provides a signaling set that 
is adaptive to the time and frequency stability of the channel, and 
we use this set to estimate the channel's information capacity. 
In the limiting regime, this approach recovers the time-invariant 
capacity up to a redundancy factor. 



I. Introduction 

A. The Time-Varying Channel 

We address the information capacity of the linear time- 
varying channel, given by the time-varying convolution 



r(t) = / h(t,t - t)s{t)(1t. 



(1) 



We determine the maximum mutual information between the 
input and the output of the system, in terms of the time-varying 
impulse response h. 

There are two ways to approach the information capacity 
of a channel; one may consider conditional probabilities [1], 
[2], or one may use the singular values of the channel's matrix 
representation [3], [4]. Here we take the latter approach. 

With appropriate conditions on h, we view equation ([TJ as 
a bounded operator mapping L 2 (R) — > L 2 (R). However, in 
order to use classical information theory tools, we must view 
the channel dTJ as a map X — > L 2 {R), where X is a finite 
dimensional subspace of L 2 (R). If {ei}i & % is an orthonormal 
basis for L 2 (R) and {efc}£ =1 is an orthonormal basis for X, 
then we represent the channel by setting 

A fcj ; = ( J h(; ■ - r)e;(r)dr, e k ), 

for I = l,...,n, k 6 Z. Then the normalized information 
capacity of the channel mapping X — > i 2 (K) by ([TJ with 
additive Gaussian noise ~ J\f(0,i] 2 I), is 



-2^1og(l + - 2 ) bits. 

k=l 1 



(2) 



In the time-invariant setting, the limit of (f2]) is known to 
converge due to a theorem of Szego [5]. For other channels, in 



particular for the time-varying channel, the analogous expres- 
sion to equation (f2]i does not a priori converge. Convergence 
questions are, therefore, one reason to first focus on the 
singular values of the time-varying channel when restricted 
to a finite dimensional subset of L 2 (R). 

A second reason for considering the restriction of the 
channel to a finite dimensional subset is that if the function h 
in equation (Q]) has mild enough characteristics to be viewed 
as a function rather than a distribution, then the channel (Q]) 
is given by a compact operator. Hence, the only accumulation 
point of its spectrum is 0, and a limit analogous to (ffj) would 
converge to 0. This would occur even though the channel may 
be very robust when restricted to a finite dimensional subspace. 

We consider the channel when restricted to a finite di- 
mensional subspace and associate that space to a region of 
the time-frequency plane. The information capacity is then 
determined by the singular values of the matrix representation 
of the channel. We relate the singular values of this matrix 
to samples of a function derived from h in (Q~|), analogously 
to samples of |/i(cj)| 2 in the time-invariant case. In particular, 
we estimate the information capacity of the linear map X —> 
L 2 (R) for an appropriate space X in terms of a function 
determined by the time-varying impulse response h. 

The difficulty here is determining an appropriate space X 
or, equivalently, an appropriate set of signaling functions. It 
is well known that functions with certain time-frequency lo- 
calization are approximate eigenfunctions of the time-varying 
channel. Moreover, it is natural to ask what the achievable 
information capacity is for the channel (fl} if the signals are 
restricted to a certain time-frequency region. Additionally, we 
impose the requirement of a structure on the set of signals. 

Our work merges two aspects of research on time-varying 
channels. Previous authors have discussed diagonalizing the 
channel and giving the capacity in terms of singular values 
[6], [7], [8], and other authors have focused on the ideal 
transmission signals [9], [10], [11]. Much of the mathematical 
approach to time-varying channels from a time-frequency anal- 
ysis perspective originated with Kozek [9], [12], [13]. While 
he addresses issues such as the composition and estimation of 
time-varying channel operators and the time-frequency local- 
ization of transmission signals, his focus is a WSUS model. 
Here we work with a deterministic channel and rigorously 
relate the channel's singular values and the signaling functions. 
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B. Summary of Results 

Our focus is on the physical layer of the channel as given by 
the model (Q~|). We show that a signaling set with appropriate 
time-frequency localization exists and show how it can be 
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obtained. We use this signaling set to approximate the singular 
values of the matrix representation of the channel by its 
diagonal entries and then further approximate the eigenvalues 
by samples of a spreading function corresponding to the 
channel's pseudodifferential operator representation. We then 
approximate the information capacity of the channel for this 
system. These results hold for an entire class of channels 
given by time-frequency localization. Lastly, we show that this 
framework recovers the classical time-invariant information 
capacity up to a redundancy factor. 

Note that we speak of maximum mutual information or 
information capacity rather than capacity. To prove a complete 
capacity result, one must prove the existence of an appropriate 
code. Codes, in turn, require error estimates as the code length 
increases. In our setting, since there are finitely many trans- 
mission signals in any time-frequency region, the codewords 
have finite length. Hence, the focus is on information capacity, 
rather than capacity. 

II. Preliminaries and Channel Model 

A. Preliminaries 

We must introduce several definitions before discussing the 
channel model. The modulation operator M u is M u f(t) = 
e 2wluJt f(t), and the translation operator T x is T x f(t) = f(t - 
x). These operators have the commutation relation T X M^ = 
e- 2 ™ u M u T x [14]. The Fourier transform of / £ L 2 (R), 
denoted /, is 



Js. 

and the corresponding operator is denoted T. For / 6 L 2 (R 2 ) 
the operator Tj, j = 1, 2, is given by 

The cross-ambiguity function of / and g £ L 2 (R) is 



A(f,g)(x,u) = / f( t +^)g(t-^)e- 2 ^dt. 
Jr l l 

The cross-Wigner distribution of / and g £ L 2 (R) is 



-2*itu, dt _ 



Note: Unless otherwise specified, all integrals are over R. 
Throughout we assume that a, (3 > 1. C is used to denote 
a constant and can take different values in a calculation. For 
a real number x, let (x) + — max(0, x). 

B. Channel Model 

We model the channel as 



r(t) 



criui, x)M u] T x s{t)duidx. 



(3) 



Therefore, the received signal is a weighted collection of mod- 
ulated and delayed copies of the transmitted signal. However, 



equation <[3j can also be derived from the following time- 
varying convolution channel, 



r(t) = I h(t, t - T)s(r)dT, 



(4) 



where h is a time-varying impulse response. By defining 
a{t,u>) = Tih{t, •) and using several Fourier transforms, see 
[14], we have 



hit, t — T)s(r)dT 



a(u>, x)M U jT- x s(t)dojdx. (5) 



We therefore may equivalently view the channel as a Weyl 
pseudodifferential operator 

has(t)=J J a(u;,x)e~ 7rlxuJ T^ x M UJ du;dx. (6) 

The function a is called the spreading function and a the 
symbol of the operator L ff [14]. Our assumption is that the 
spreading function decays exponentially; that is, there exist 
constants a, {3, C > such that 



\&(u,x)\ < Ce-"M-°M 



(7) 



Note first that equation © is more general than the common 
assumption of an underspread channel [15]. More importantly 
the approach to the time-varying channel indicated by the 
assumption in equation (0 does not rely on the notion of 
coherence time. We do not make any assumptions that the 
channel is constant during a coherence time period, as is 
discussed, for example in [16]. In fact, our channel may-as 
is physically sensible-be always smoothly evolving and never 
static. 

Decay in the second variable of a indicates that the 
time-varying impulse response decays exponentially. Since 
!Fi [a(-, x)] gives the evolution of the impulse response over 
time, decay in the first variable of a indicates that the higher 
order derivatives with respect to t of hit, t) decay rapidly; 
thus, a(t,u)) evolves smoothly. A large (3 indicates that the 
channel is stable, or evolves smoothly, in time, whereas a large 
a indicates that the channel is stable, or varies smoothly, in 
frequency. Also, as (3 — > oo the channel approaches the time- 
invariant regime. The basic approach of our work is to design 
a system that can adapt to the parameters a and f3 to exploit 
either time or frequency stability. The next section establishes 
this framework. 

III. Signaling Set and Frames 

A. Localized Signals 

Our goal is to construct a set of signals that have "most" 
of their mass supported in the time-frequency region [0, T] x 
[-W, W] and are approximate eigenfunctions of the channel. 
Lemma ( 13.61 ) below shows that functions with appropriate 
time-frequency concentration approximately diagonalize the 
pseudodifferential operator channel given by © when the 
spreading function a satisfies the decay condition (|7]). 

The uncertainty principle is the foundation of any study 
concerning the time-frequency properties of a signaling set 
[17], [14]. The uncertainty principle entered into the discussion 
of capacity in an article of Wyner [18], in which he establishes 
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Shannon's capacity for the time-invariant channel using phys- 
ically realizable signals. 

Theorem 3.1: If / e L 2 (R), then for all a, b e K 

x (J (u>-bf\f(u)\ 2 dw\ ' , 

and equality holds if and only if / is a (modulated, shifted, 
scaled) Gaussian. 

The simplest consequence of this theorem is that a function 
in L 2 (R) cannot have compact time and frequency support. 
Therefore, the set of functions with time-frequency support 
exclusively contained in [0, T] x [-W, W] contains only the 
zero function; moreover, the set of functions with "most" of 
their mass supported in [0, T] x [— W,W], is not a closed 
subspace of L 2 (R), and, hence, does not have an orthonormal 
basis. 

A second criterion for the signaling set is that the functions 
be linearly independent. If this is not the case, then even 
in the absence of noise a received signal does not uniquely 
determine the transmitted linear combination of signals. Thus, 
the problem is to construct a linearly independent set of 
functions with the appropriate time-frequency localization. 
One approach with a long history is to use prolate spheroidal 
wave functions, as for example Wyner does [18]. However, 
we prefer the Weyl-Heisenberg transmission signals, as these 
functions offer exponential decay in time and frequency and 
yield better capacity estimates. 

There is a second consequence of the uncertainty principle 
that makes this task more difficult, namely the Balian-Low 
theorem [19]. 

Theorem 3.2: If {MiT k 4>} k) i e z is a Riesz basis for L 2 (R), 
then either x<j>(x) £ L 2 (R) or w<f>(u)) £ L 2 (R). 
That is, if the signaling set has the useful structural property 
of consisting of time-frequency shifts of an original function 
and is a linearly independent set, then it cannot be localized 
in both time and frequency. This theorem has often been 
overlooked in the literature (though a correct discussion may 
be found in [9]); some researchers have assumed the existence 
of an orthonormal basis (a type of Riesz basis) with time 
and frequency decay that violates the theorem. However, we 
may relax the condition that the closure of the signaling 
set span L 2 (R) and, in turn, obtain improved time-frequency 
localization. That is, we consider time-frequency shifts of an 
initial function that are orthonormal but the closure of which 
does not span L 2 (R). There are several steps necessary to 
prove that such a construction is possible; the first is to look 
at frames. 

B. Weyl-Heisenberg Frames 

Definition 3.3: [14] A subset {ei}i £ A of a Hilbert space H 
is a frame for H if there exist constants A and B such that 
for all / e H, 

Aj2\(f,ei)\ 2 <\\m<Bj2\(f,^}\ 2 ' 

ieA ieA 



A and B are called the frame bounds and a frame is tight if 
A = B. The frame operator S is given by 

S/ = 22(f, e»)e l . 
ieA 

Note that S is a positive operator. 

Definition 3.4: If (0, a, b) = {MuT a k(f>}k,lez, a,b <E K+, 
is a frame, then it is called a Gabor frame or Weyl-Heisenberg 
frame. The redundancy is 

C. Signaling Set 

In this section we design a signaling set ^ = {ipk,i}k,iez C 
L 2 (R) adoptively according to the time-frequency localiza- 
tion of the channel such that {ipk,i}k,iez are approximate 
eigenfunctions of L CT . (Approximate eigenfunction means 
that the operator applied to the function is approximately 
a scalar times the function; it is not necessarily related 
to approximate spectrum.) To communicate using the time- 
frequency region [0, T] x [— W, W] one selects a subset v&i = 
{■0fc,;}(fc,i)ei C * approximately satisfying supp(?/> fc ,i) C 
[0, T] and supp(^ M ) C [-W, W] for all (k, I) 6 I . The trans- 
mitted signals then have the form s(t) = 2~2(k i)ei c k,i"4>k,i\ 
and the model is 

r(t) = L CT s(t) + n(t). (8) 
The operator L CT expressed as a matrix is 

A k ,i,k',i< = (Mvi'.Vfci) (M) ex, k',l' e z. (9) 

The following proposition is critical: it states that a set 

with the desired properties exists. The signals in 
are approximate eigenfunctions of L CT , and we use them to 
address the eigenvalues of the matrix A* A, which in turn 
determine the information capacity. Moreover, the proof of 
Proposition 13.51 may be viewed as a method for obtaining these 
signals. 

Proposition 3.5: Let g s (t) = (2s) _1 / 4 e _ ^ t , and set ip s = 
S-V 2 g s . Then (</> s , f , f ) = (A,pa, pb) (ab = 1 and p > 1) 
is an orthonormal system and there exist constants C > and 
< D < 1 such that 

\ip s {t)\ < Ce-°^ Vtel 

\Mlu)\ < Ce~ D ™ lujl VweR. 
Note here that the redundancy of (ip s , ~ , ji) is \. The Weyl- 
Heisenberg set (ips, pa, pb) consists of time-frequency shifts 
of the window function ip s on the time-frequency lattice 
apl x bpL. Since ab = 1, the density of this lattice is 
4y. Clearly, the closer p moves to 1, the more linearly 
independent functions get assigned to a rectangle of the time- 
frequency plane. Yet the Balian-Low theorem states that as 
p \ 1, the time-frequency localization is lost, which for this 
system means D — ► 0. Additionally, the constant C increases 
dramatically. These effects are caused by the operator S -1 / 2 , 
which becomes increasingly poorly conditioned as the density 
approaches 1. A detailed discussion of the density parameter 
p, however, is beyond the scope of this paper. 

We give the proof of Proposition 13.51 here because it 
provides insight to the signaling system; the proofs of all 
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other theorems are given in Appendix A. Proof: A 
fundamental theorem due to Lyubarskii, Seip and Wallsten 
states that (g s , -) is a frame for L 2 (R) if and only if 2| < 1 
[20], [21], [22]. P By Theorem 5.1.6 and Corollary 7.3.2 in [14], 
(S- 1 / 2 ^, |, a) = (V»„ f , f ) is a tight frame for L 2 (M) with 
frame constant p 2 . By an essential result due to several authors 
[23], [24], [25], (V,f,f) = 0>,pb,pa) (ab = 1) is then an 
orthonormal set (which does not span L 2 (M.)). By Theorem 5 
in [26], up to a factor < D < 1, the exponential decay of 
g s and <? s is preserved in ip s and ip s . Finally, Theorem IV.2 
in [27] implies that if %pi is the window function for the 
orthonormal set based on the initial window g%, then ip s is 
the corresponding window function for g s . ■ 
We use the parameters s, a and b to control the time and 
frequency localization of the signaling set and, hence, to adapt 
the signals to the channel. We will return to these parameters 
later. 

D. Transmitter and Receiver Structure 

We have two sets of functions: {Mi bl Ti ak ip}k iez, is 
not linearly independent, but its closure spans L 2 (M); 
{M pb {T pa k] k,i^i is linearly independent, but its closure does 
not span L 2 (M). In order to preserve linear independence, the 
transmitter uses the first set to transmit data; however, the 
receiver will use the second set of functions to detect the 
signal. We set: 

. ^ c = Hs 

. r k ,i = M lbl T lak r (ab = 1) 
. Vjfc! = M P pbl T p P ak ^ nc {ab = 1) 

Here c stands for "complete" and nc stands for "not complete". 

The transmitter maps the information-bearing coefficients 
in C to the orthonormal set {ip k 1}k,iei an d transmits s(t) = 
S(fc j)ei c * ) iV'jfe1> f° r m e index set I. It is possible that the 
channel moves signals from the non-spanning set {ipk1}(k,i)ei 
into its complement, and so if the receiver used the non- 
spanning set for detection, these signals would not be captured. 
Therefore, the receiver uses the spanning set {ipi ;}fc,zez to 
detect the signal. We introduce the coefficient or analysis 
operator C c : L 2 (WL) -> Z 2 (Z 2 ), given by 

(c c /k ; H/,^), 

and its adjoint C c * : I 2 (J?) -> L 2 (R), given by 

C c *{a k ,i}k,iez = ^2 ak ^k,i- 
k.iez 

Note that C C *C C = S c . If the receiver functions {ip% ;}fe,iez 
are not rescaled appropriately, for example if we use ip c = 
ip s , then receiver will amplify the received signal. By setting 
i\) c — -ip s , (-ips, -, -) is a tight frame with redundancy \, 
and ||S Pc || = redundancy x \\%P C \\ = 1 [14], [19]. Therefore S c 
is an isometry on L 2 (K) and, hence, n(t) and S c n(t) have the 
same power spectral density. The channel may now be viewed 
as a map 

C c L a C nc * : l 2 (l) -» l 2 (Z 2 ). (10) 



This is expressed as a matrix as 

A klkn , = (L^,,^). (11) 

Now the capacity of the channel restricted to the span of 

Wfe' c i'}(fc',2')ex is given by 

(k,i)ex v ' ' 

The operator A* A is the matrix representation of 

c ,ic l;c c *c c l ct c ,ic * = c nc L;s c L ff c nc (13) 

= C" c L;L CT C nc *. (14) 



E. Approximate Diagonalization 

The fact that the signals presented in Proposition 13.51 ap- 
proximately diagonalize the operator L CT is the foundation for 
this entire work. This fact is made precise in the following 
lemma. 

Lemma 3.6: Let V = S^ 1 / 2 ^)- 1 / 4 ^ e and ip^ = 
M r t,iT ra kip for positive constants a, b, ab = 1. Assume that 
the decay of a is given by 

\a(u,x)\ < Ce-^l-aM. 

Then 

x (^ e - aar \-^ k - k '\ _j_ e - 2L T lar \-^- k - k '\y 

IV. Regional Information Capacity 

A. Information Capacity with Receiver Channel Knowledge 

The previous section established the existence of signals that 
approximately diagonalize the time-varying channel. Now we 
use the localization of these signals to a region of the time- 
frequency plane to determine the channel capacity specific to 
that region. Moreover, we relate the capacity to samples in that 
region of an "auto-convolved" form of the channel's spreading 
function. 

Let R = [T U T 2 ] x [Wi,W 2 ], T x < T 2 , and W x < W 2 , 
denote a region of the time-frequency plane. Without loss of 
generality we take R= [0, T] x [— W, W]. Let the channel be 
given by r = L CT s + n, n ~ A/"(0, n 2 ), where a satisfies the 
decay condition (|7]). We consider the following regime: 

PI) = S- 1 /2 2 -l/4(|)-l/2 e — 

P2) the signals are contained in the finite dimensional space 

* = span{V^}fi ii= _ L 

= span{M pfi T £fc V}f=o,*=-£- 
P3) K=Y^\ andL^Lf |j. 

Thus, while the following results are specific to the system 
just described, the system is sensible. This is evidenced by the 
fact that these signals are approximate eigenfunctions of L CT 
(Lemma [3. 6K that the signals have structure and an algorithm 
for their construction (Proposition 13. 5b , and that they allow 
the capacity to be approximated by the sample values of an 
"auto-convolved" Weyl symbol (Theorem 14.1b . 
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We assume that the receiver has perfect channel state 
information (CSIR), and that the transmitter knows only the 
decay parameters a and (3 and the rectangle [0, T] x [— W, W]. 
Once the space given by property P2 is fixed, the matrix A 
is determined, and the channel may be viewed as a standard 
matrix channel. Denoting the information capacity XC a ,R, we 
have 

, A fci (A*A) 



ICaM = }^ log ( 1 

fc=0,2 = -L 



?r 

Theorem 14.11 provides an estimate for this quantity in terms 
of a function S that is derived from the time-varying impulse 
response h or, equivalently, the Weyl symbol a. 

Now we relate the eigenvalues of A* A to the Weyl symbol 
a. The relationship between the spectral values and the sample 
values of the symbol is one of the difficulties of the time- 
varying channel and is a property that strongly distinguishes 
it from the time-invariant channel. For a time-invariant channel 
given by impulse response hi, the spectrum is given exactly 
by the values that hi takes. In the time-varying setting, such 
a simple characterization does not exist. For example, a Weyl 
pseudodifferential operator may have all positive eigenvalues, 
yet its symbol may take negative values; an operator may also 
have a strictly positive symbol, yet not be a positive operator 
[28]. Thus, in many ways the spectral properties of time- 
varying convolution operators are much more complicated 
than their time-invariant counterparts. Yet, we are still able to 
approximate spectral values of the time-varying convolution 
operators by sample values of the appropriate symbol. The 
relationship that we prove here has been discussed by other 
authors [8], [11], but it had not yet been rigorously proved. 

The composition of two Weyl pseudodifferential operators 
is given by L CT L r = L CT j T , where aftr = J 7 ~ 1 (a\\T) is called 
the twisted product of a and r, and 

(&\\t)(u>, x) = // <t(o/, x')f (lo — a/, x — x')e~ v w ~ bJX ^ dui 1 dx 



is called the twisted convolution of a and f. We have that 
<7(t,ui) = Tih(t, •), and we set S — after, the Fourier 
transform of the "twisted auto-convolution" of a. The adjoint 
operator is L* = [14]. Therefore, since L*L CT is self 
adjoint, S is a real-valued function. Lemma 11.21 shows that 
the appropriate samples of S are approximately the diagonal 
entries of A* A, which are real and positive. However, if 
(A*A)km is small, the corresponding sample of S may, 
in fact, be negative. We, therefore, work with S + (x,oj) = 
(S(x,uj)) + so that negative values do not enter into the log 
function. 

The channel capacity is then approximately given by the 
samples of S, analogously to samples of \hi\ 2 giving the 
capacity of a time-invariant channel with impulse response 
hi. Moreover, these samples are all taken in the region of 
the time-frequency plane in which the signaling functions are 
localized. We, therefore, have capacity in terms of two physical 
properties: the time-varying transfer function and the time- 
frequency region. 

Theorem 4.1 (Information Capacity for CSIR): Let R be a 
closed rectangular region of the time-frequency plane. Assume 
the channel is given by r = li a s + n, n ~ JV(Q,rj 2 ), that 



a satisfies the decay condition ©, and that the receiver has 
perfect channel knowledge, while the transmitter knows only 
a, j3 and R. Also assume the signaling set is given according 
to properties P1-P3 above. Set S = afta and S + (x,u>) = 
(S(x,u))+. Then 



K.L 



k=0,l=-L 



< 2ifLlog(l + 0{e-^ if)+a) + 



{a[3D)- 



B. Information Capacity with Transmitter Channel Knowledge 

Now we treat the same channel as discussed in the previous 
section under the assumption that both the transmitter and 
receiver have perfect channel knowledge (CSIT). The space 
given by P2 as in the previous section is identical, and again 
the channel may then be viewed as a standard matrix channel. 
Since the transmitter and receiver both have perfect channel 
knowledge, the transmitter will use the eigenvectors of A* A 
for transmission and allocate power according to water-filling 
[4]. We denote the information capacity for the CSIT regime 
by IC™ R , and we have 



1C 



K.L 

t.r = 

k=0,l=-L 
i K.L 



log 1 



P kl X kl {A*A) 
rj 2 



(15) 



where {Pki} k l i—_ L denotes the power allocated to the 

eigenvectors and X)fc=o i=-l ^ki = P is the total power 
available. 

Theorem 14. II shows that the eigenvalues of A* A are ap- 
proximated by the sample values of S (this is stated explicitly 
as Corollary 11.3b . Therefore, rather than computing first the 
matrix A* A and its eigenvalues, we can approximate these 
eigenvalues by just sampling the symbol S. In order to ap- 
proximate ZC™ R we require approximations for the power al- 
locations {Pki}k=o i=-l- Ag a i n > rather than computing these 
from the eigenvalues of A* A, we allocate power based on the 
approximate eigenvalues {S + (p^k, p%l)}f_^ i= _ L - 

Theorem 4.2 (Information Capacity for CSIT): Let R be a 
closed rectangular region of the time-frequency plane. Assume 
the channel is given by r = L CT s + n, n ~ A/"(0,?y 2 ), that a 
satisfies the decay condition Q, and that both the transmitter 
and receiver have perfect channel knowledge. Also assume the 
signaling set is given according to properties P1-P3 above. Set 
S = after and S + (x,oj) = (S(x,oj)) + . Let P be the total 
power available to the transmitter. Then 



Tf™ - 



K.L 

E 

k=0,l=- 



log 1 



<2KLlog{l + 2KL- 0(e 



-SG3+") 



(a(5D) 



r)), 



where, {P^}fc=o,z=-L> Eto,t-i^ = p > denotes the 
water-filling allocation based on the approximate eigenvalues 

{S+(p£k,pfl)}f± L 0;l= _ L . 
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V. Recovery of Time-Invariant Capacity 

In the previous section we assumed the channel spreading 
function satisfied 

\a{u,x)\ < c e -/3M-akl. 

We now consider a sequence of time-varying channels ap- 
proaching the time-invariant channel. We assume the sequence 
of Weyl symbols has the following properties: 

P4) \a n {oj,x)\ < C n e-P"M- a W 

Note that a is fixed for all neN. 
P5) there exist positive constants M and m such that 



M > 



\cr(u>, x)\dujdx > m V n G N. 



P6) P n > (3 n i for n > n' and Hindoo f3 n = oo. 
If the sequence {<j n (oj, x)} n ^ satisfies these properties, then 
a n (oj,x) — > 6(uj)h(x), and therefore a n (x,uj) —> h(uj). 
The generalized eigenvectors of a time-invariant channel are 
pure frequencies. So as the channel becomes more stable, the 
eigenvectors approach pure frequencies. One then exploits this 
stability by transmitting signals closer to pure frequencies, 
which means transmitting longer signals. We assume that the 
channel is band-limited to [— W, W]. 

Theorem 5.1 (Time-Invariant Capacity for CSIR): Let 
{c„}„eN be a sequence of Weyl symbols converging to 
the linear time-invariant channel given by the impulse 
response h(x) according to properties P4-P6. Set 
= [0) if] x [ — Wj W], and let the noise and signaling 
system be as in Theorem 14.11 Then 



lim 



1 



K,L 



log 1 



P w j_ w log \ l 



\h{w)\ 



du>. 



2pWj_ w -\ rr 
Theorem 5.2 (Time-Invariant Capacity for CSIT): 
We make the same hypotheses and definitions as for 
Theorem 15.11 except that the transmitter now also has perfect 
channel knowledge and total power resources Protai- Let 
{-^w"}£oi=-£ denote the water-filling power allocations 
based on the approximate eigenvalues S+(p^k, p^l). Then 




where P(lo) is the power allocation determined by water- 
filling, as in the classical time-invariant case [4], and 



J_ w P(uj)duj = P To tab 



Appendix I 
Proofs of Theorems 



Proof: [Proof of Lemma [3761 The following two essential 
identities hold for Weyl pseudodifferential operators [14]: 



(L (T f,g) = (a,W(g,f)) 



(16) 



\(L a T u M n f, T v M 1 g)\ = \(& * A(f,g))(u - v, r, - 7 )| . 

(17) 

We define the function <f> for c\ , C2 > 1 and will use the 
following bound: 



(C 1 ,C 2 ;X) 



e - c i\y\ e ~ c 2\X-y\ 



dy 



1 



ci + c 2 



1 



( e -ciW +e -<*|x|) 



+ _^( e -c 1 |X| _ e -ca|Xh 
2 - Ci 



= 1 I XI , „-co|X| 



< Ce e " cl 

The system is given by tp s = S~ 1 / 2 g s , where g s (t) 
(2s) _1 / 4 e~5' , and by Proposition (13. 5b 

\M*)\ <Ce-* D W 
\tp s (uj)\ < Ce-*' D M. 

Lemma 12.11 implies 

\A{ip,ip)(cj,x)\ < Ce~* I, l w l-H a N. 

\Ak,i,k>,i>\ = |<L<r^fc'i', i>ki)\ 

= \(L a M pb i,T pak ^,M lbl Ti ak ^)\ 

P P 

= \(a * A{^^)){a{-k - pk'),b(-l - pl'))\ 
P P 

a(uj, x)A(ip, ip) 

(a( — k — pk') — x, b(—l — pi') — uj)dudx\ 
P P 

< cJJ e -0M-«M 



x / e -±\x\-^\a{±;k-pk')-x\ dx 



7T I 7TS D 1 

= C<f>(p, —D, b(-l - pl'))4>(a, —,a(-k - pk')) 
4s p 4 p 



/ — aar\ -Krk— k' I 

x(e 



For Theorem 14.11 we first prove the following lemmas. 
Lemma 1.1: Assume the hypotheses of Theorem 14. II Then 



Proof: 



\lCl R -log(l + P kl (A*A) klkl )\ 
< 2KL\og(l + 0{e-^ (3+a) )^ 



[A*A) klk , v \ 



AjyfcjA 3 yfc/j/| 



7 



- E \ A jj'kl\\Ajj'k>l>\ 



x y2^ e - aa p\k—^j\ + e -^a P \k-^j\^ 



3& 



/ — aap\ -yi- k' , — ns . D ap\ -\i — k' I \-i 

x(e p^ + e 4 p 2 )} 



+2e~ 



Using Lemma [2761 

|JC£ fl -log(l + fl«(A*A) WH )| 
< 2ifLlog(l + 0(e-4 (/3+a) )) 



Lemma 1.2: Let 5 = cr(Jcr. Then 

l(A*A kw -^ fc ,p^l^(^ 

Proof: We first look at (A*A)fcifc;. The diagonal entries 
of A* A are 

(A*A) WW = J2 l<L^>fe<>| 2 



+e -&06„(|I-4 r j'|+|^i'-I|)j 



jGZ 

+2e 



-aap(|fc--4-j"| + |-t-i-fc'|) 



nc II 2 



p p 



-aar | fe- -4- j | - ap| -i j - fc' 



since ^a, ^6) is tight Gabor frame (Proposition ( 13.51 )). 

l.nc || 2 



+ e -^^(\k-jiJ\ + \jiJ-k\)^ 
< c ,^-/J6pJi^i +e - 1 g.IJ6p|I-l'|j 

Ife— fc 1 tvsD I J I / I 

x{e~ aap ^~^ + e — §- a Pl fe - fe I) 
where we have used Lemmas 13.61 and 12.21 Next 
J2 \(A*A) klkn ,\ 



= Ha,W(^,^)> 
S(x, u)) 



k=-K,...,K,ky^k' 

l=-L,...,L,W 



< 



c E (< 

k=-K,..,K,k^k' 



\k-k'\ 



+ e -ZfL ap \k-k'\j l8 - ) 



R 2 

By the Riemann-Lebesgue Lemma, 

||<5'||oo < yy \S(u,x)\duidx 

< 9C 2 [[ -^-e-^-^dudx 
J J a(3 

9 • 2 4 C* 2 



l = -L,..X,l^l> 
1 e -aap/2 



2 ' 



O ( 



e s 



1 — e^ aa P/ 2 1 



x ( 



-/3fcp/2 



pap) 
fc£>6p 



1 _ e -/3fcp/2 1 -^g-^D&p) 



(19) 



Now we set 



f3 a a ( (3 

a= -, o= -, s= - = - 
a p b \a 



Then 



aa = (3, sa 



fib = a and - = [ — 

s \(3 



and equation (fT~9b becomes 

= o ( 



-(l^Dp/S 



X ( 



-ap/2 



-(£ ) 3 77Dp/8 



1 - e-"<V 2 1 _ e -(f) 37r£) p/8 



We use Lemma 12.11 and the fact that 

JfW(il>,il>)(u,x)dudx = HVIIi = 1 [14]. 

-S(p-k,p%l)\ 
a p 

= | / S(x,uj)W('ip,'ip)(x — p—k,u> — p—l)dujdx 

-S(p^l,p-k)\ 
p a 

(3 a 
Six + p—k, uj + p—l)W(ip, ip)(x, u>)dujdx 
a [3 

-S(p^l,p-k)\ 

= \f [six + ph^ + p-n-sip-Lph)] 

J R 2 a (3 pa 

W(ip,ip)(x,u))du)dx)\ 

< Halloo / (\x\ + \u\)W(il>,ip)(x,w)\dudx 

JWL 2 



O 



e 2 



(/3+a) 



(1 - e-^^ D P/ s )(l - e r ( -f )37rDp/s ) I 
O (e-f0W). ( 20) 



< C 
= C 



1 



(a/3) 2 
1 



(\x\ + \uj\)e-^ x \-^ D ^dujdx 



{a/3D) 2 

These two bounds prove the lemma. 



s 



Lemma 1.3: Let S = after and S + (x,oj) = (S(x,u>)) 
Then 



log(l + A fc ,,(A*A)) - log(l + S + ( P ^k, p-l)) 

a p 



= log HO e 



Proof: Using Lemmas 11.11 and [L2l 



{a(3Df 



log(l + Afe j (A* A)) - log(l + S+ipP-k, p^l)) 

a p 

< |log(l + A M (A*A)) - log(l + (A*A) W «)| 

+ log(l + (A*A) Hkl 

1 



log(l + S+(A,P?0) 
a p 



K0+CX) 



(af3Df 



log 1 + e~* 



Proof: [Proof of Theorem 14.111 The proof follows from 
Lemmas 11.11 and 11.31 ■ 

Proof: [Proof of Theorem 14.211 Applying Lemma 12.81 to 
Lemma 11.31 yields the result. ■ 

Proof: [Proof of Theorem |3j] We set K = |_-J = 
and L = L^^rJ- Also, set S n = On$a„ and {S+{x,lu) = 
(S(x,ui)) + . Theorem 14. 1 1 yields the following: 

lim JC an .R n 



lim 



L ^ ^ , / s+( P ^k,pf-iy 



L„ 



E E 1+ 



-ir 



lim „ -r- log 



rwoo 2W/J 2 /?„ 



-ir 



ir 



iia ^ , /. . iM^or 



E log i 



1 / \h(oj)\ 2 \ , 

Therefore, up to the redundancy, we recover the normalized 
time-invariant information capacity: 

lim 2 CT „, fl „ = -— / log l + LUL U. 

n^oo p 2W J_ w \ f] 1 



Proof: [Proof of Theorem 15.21 This proof relies on 
Theorem 14.21 in the same way that the proof of Theorem 15.11 
relies on Theorem 14.11 The only difference is that one must 



consider the convergence of {-P^"}^io z=-l to ^ > ( aj )- This 
convergence is given by the fact that as j3 n — > oo, the 
approximate eigenvalues converge to the exact eigenvalues, 
Lemma 11.31 Consequently, the estimated power allocation 
also converges to the optimal power allocation, Lemma Q3] 
The optimal power allocation for each a n then converges to 
P(uo) as n — + oo since the values S + (p^k, p^l) converge to 
samples of |/i(o;)| 2 . 



We note that the factor 2KL inside the log in the statement 
of Theorem 14.21 is now 2W^. Therefore, the error decay 
given in Theorem 14.21 will only be of the order 0(f3~ 1 ). ■ 

Appendix II 
Lemmas Used 

Lemma 2.1: If \ifj(x)\ < Ce~ ci \ x \ and |$(w)| < Ce~ c ^ 
for c\ , C2 > 0, then 

\W(lp,1p){x,Uj)\ < C*2 e -i( Cl |x|+ C2 | W |) 

and 

\A(il>,tt>)(x,w)\ < c 2 e -i (ciM+C2 ^ l) . 
Proof: The proof is contained in the proof of Theo- 
rem 2.4 in [29], when one views both distributions as short- 
time Fourier transforms, as explained in [14]. ■ 
Lemma 2.2: For a and p > 0, 

e — a(\k-pj\+\pj-k'\ < C'e~'5~! fe ~' c '! 

j'ez 

and for a, f3,p > 0, a ^ (3, 

y2 e -<x\k-pj\-[3\p3-k'\) < c ^ e -p\h=g± +e - a \h=^ly 

Proof: w.l.o.g., k < k' . 

« for pj e [k, k'], \k - pj\ + \k' - pj\ = \k - k'\ 

• for pj > k', pj = k' + b. b > 0, |fe — pj\ + \k' — pj\ — 

\k - (k' + b)\ + \b\ = \k-k'\ + 2b 

• for pj < k, pj = k — b. b > 0, \k — pj\ + \k' — pj\ = 
\b\ + \k'- (k-b)\ = \k-k'\ +26 



e - a (\ k -o\+\k'-j\) 

OO 

= [(I + p\k - k'\)\e- a ^ k - k ' 1 + 2^2 e~ 2apJ e~ a 

3=1 



\k-k'\ 



= L(l +P\k fe'|)Je-"l fe - fe 'l + 2e-°l fc - fc 'l( i _ e 1 _ 2pa - 1) 
= L(l +Pl* - fc'DJe-l^ ' + 2e-«l fe - fe 

Again, w.l.og., k < k' . 

oo 

e -ot\k-pj\-0\pj-k'\ < e -q|fe-fc'| e -(a+g)pj 



3>m 



Similarly, 



3=0 



= £7 e - Q l fc - fc ' 



(21) 



J<L5J 



Set TV = 



_ r \h-K 



E « 

L|J<3<L|J 



-a|fe-pj|-/8|pj-fc'| 



9 



< 



-<*Pj e -P(\k-k'\-pj) 



N 



+ e 



-a(|fc— fc'|— pj) „-ppj 



i'=l 



AT 



+e 



C(e" 



—a I fc — k 



,|/2 E 

ife— fe ; [ 



) 



(23) 



Finally, (gU + 122) + (|23j < C(e" 



Definition 2.3: [30] Let a;,?/ G K" satisfy 2~27=i x i = 
E"=i Vi- Let x' and y' be the vectors given by reordering 
x and y so that x\ > x'j and y\ > y'j for all 1 < i < j < n. 
Then y majorizes x, denoted x ~< y, if 

k 3 
XX -E 2 ^ for j" = 1, n. 

i=l i=l 

Theorem 2.4: [30] Let P be an n x n Hermitian matrix with 
diagonal entries hi, h n and eigenvalues Ai, ...An. Then h -< 
A. 

Proposition 2.5: [30] If x, y <E 1" and x -< y, then 

nr=i x i > nr=i 

Corollary 2.6: Let A be an n x n Hermitian matrix, satis- 
fying 

\Aij\ < e for all i. 

j=l,...,n j^i 

Then 

log(l + A i4 )- J2 lo g(! + ^) <nlog(l + e). 

l<i<n l<i<n 

Proof: First, by Theorem 12.41 and Proposition 12.51 

nr=i A a > nr=i a *> and hence Ei_<i<„ + > 

^ 1<i<n log(l + Xi). By the Gershgorin Disc Theorem [31] 
there exist such that = \ + ei and |e^| < e for all i. 
By Proposition (I2.5K 



lo g n(l + A M )-log[](l + A 4 ) 



i=i 



logll(— 



2 = 1 



+ A, 

6 



2=1 

< nlog(l + e) 



Lemma 2.7: Consider water- filling performed on two SNR 
sequences {Ni, ...,Nl} and {Mi, Ml} with common total 
power constraint P. If | iVi — Af; | < e for all I, then the water- 
filling allocations {Pf , P^} and {P X M , P^} satisfy 
\ p n _ pM| < (i + i) e for a n /. 

Proof: Recall that for a real number x, (x) + = 
max(0,x). Define </> N (x) = Ef=iO - N i) + and </> M (a) = 

Ez=iO " M *) + - If l-^i - M i\ < e for a11 then ~ 
qb M (x)\ < Le for all x. Both ^ and </> M are strictly 
increasing for x > Ni and x > Mi respectively. If, without 



loss of generality, tj> (xq) = P and <p M (xq) < P, then 
<p N (xo) — <fi M (xq) < Le. Since the slope of <f> is at least one, 
(f> M (x^) = P for some xff < x$ + Le. Let P t N = (xg - 
Ni)+ and P ; M = {xf - M t )+. Then \P t N - P ; M | < (L + l)e. 



Lemma 2.8: Consider a standard matrix channel given by 
y = Ax + n, where A : C m -> C" and n - WC(0, n 2 Pj 
and n 2 > 0. Let the eigenvalues of A* A be denoted 
{Ai,...,A n } and let the power allocated according to water- 
filling for the SNR values £■} be {P X A , ...,P A }. Let 
water-filling also be performed on the approximate SNR 

2 2 

values {2-, where {/i l7 /i„} are approximations 
to the eigenvalues of A* A, and denote this power allocation 
{Pf, ...,P£}. If |Aj - < e for all / = 1, ...,n, then also 
for I = 1, ...,n, 



|log(l 



^L)|=bg(l + (n+l)0(e)). 



, ,-log(L 

Proof: The values iV; from Lemma 12.71 correspond to 
Ni = £ and M l = £ = ^JjL. where |q| < e. Then |JVj - 

M 'l = £ Il7+i7l- Let P i M = + ^- B y LemmaO |<5 Z | < 

2 



A, I A;+e f 



. Now 



log(l 



2 ") - log(l + , 



= |log( 



P A A, 



P A A/ 



< 



< 



I log(l 4 

i log 



P; A |£j| , 1 (n + l) V 2 e 



P? 



Xl T 1 2 +P l X ^ 

1 (n + l)rj 2 
Xi r} 2 + P A 
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